Utilizing term proximity for blog post retrieval
نویسندگان
چکیده
Term proximity is effective for many information retrieval (IR) research fields yet remains unexplored in blogosphere IR. The blogosphere is characterized by large amounts of noise, including incohesive, off-topic content and spam. Consequently, the classical bag-ofwords unigram IR models are not reliable enough to provide robust and effective retrieval performance. In this article, we propose to boost the blog postretrieval performance by employing term proximity information. We investigate a variety of popular and state-of-the-art proximity-based statistical IR models, including a proximity-based counting model, the Markov random field (MRF) model, and the divergence from randomness (DFR) multinomial model. Extensive experimentation on the standard TREC Blog06 test dataset demonstrates that the introduction of term proximity information is indeed beneficial to retrieval from the blogosphere. Results also indicate the superiority of the unordered bi-gram model with the sequential-dependence phrases over other variants of the proximity-based models. Finally, inspired by the effectiveness of proximity models, we extend our study by exploring the proximity evidence between uery terms and opinionated terms. The consequent opinionated proximity model shows promising performance in the experiments.
منابع مشابه
University of Glasgow at TREC 2007: Experiments in Blog and Enterprise Tracks with Terrier
In TREC 2007, we participate in four tasks of the Blog and Enterprise tracks. We continue experiments using Terrier [14], our modular and scalable Information Retrieval (IR) platform, and the Divergence From Randomness (DFR) framework. In particular, for the Blog track opinion finding task, we propose a statistical term weighting approach to identify opinionated documents. An alternative approa...
متن کاملCredibility Improves Topical Blog Post Retrieval
Topical blog post retrieval is the task of ranking blog posts with respect to their relevance for a given topic. To improve topical blog post retrieval we incorporate textual credibility indicators in the retrieval process. We consider two groups of indicators: post level (determined using information about individual blog posts only) and blog level (determined using information from the underl...
متن کاملInvestigating Learning Approaches for Blog Post Opinion Retrieval
Blog post opinion retrieval is the problem of identifying posts which express an opinion about a particular topic. Usually the problem is solved using a 3 step process in which relevant posts are first retrieved, then opinion scores are generated for each document, and finally the opinion and relevance scores are combined to produce a single ranking. In this paper, we study the effectiveness of...
متن کاملIntegrating Proximity to Subjective Sentences for Blog Opinion Retrieval
Opinion finding is a challenging retrieval task, where it has been shown that it is especially difficult to improve over a strongly performing topic-relevance baseline. In this paper, we propose a novel approach for opinion finding, which takes into account the proximity of query terms to subjective sentences in a document. We adapt two stateof-the-art opinion detection techniques to identify s...
متن کاملCombining Language Model with Sentiment Analysis for Opinion Retrieval of Blog-Post
This paper describes our participation in Blog Opinion Retrieval task this year. We conduct experiments on “FirteX” platform that is developed by our lab. Language Model is used to retrieve related blog unit. Interactive Knowledge is adopted to expand query for retrieve blog unit include opinion. Then we introduce a novel extracting technology to extract text from retrieved blog-post. Finally, ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید
ثبت ناماگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید
ورودعنوان ژورنال:
- JASIST
دوره 64 شماره
صفحات -
تاریخ انتشار 2013